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Abstract 

We investigate degree-degree correlations for scale-free graph sequences. The main conclusion 

of this paper is that the assortativity coefficient is not the appropriate way to describe degree- 
dependences in scale-free random graphs. Indeed, we study the infinite volume limit of the 
assortativity coefficient, and show that this limit is always non-negative when the degrees have 
finite first but infinite third moment, i.e., when the degree exponent 7 -f 1 of the density satisfies 
7 e (1)3). More generally, our results show that the correlation coefficient is inappropriate to 
describe dependencies between random variables having infinite variance. 

Wc start with a simple model of the sample correlation of random variables X and Y, which are 
linear combinations with non-negative coefficients of the same infinite variance random variables. 
In this case, the correlation coefficient of X and Y is not defined, and the sample covariance 
converges to a proper random variable with support that is a subinterval of (—1,1). Further, 
for any joint distribution {X, Y) with equal marginals being non-negative power-law distributions 
with infinite variance (as in the case of degree-degree correlations), we show that the limit is 
non-negative. We next adapt these results to the assortativity in networks as described by the 
degree-degree correlation coefficient, and show that it is non-negative in the large graph limit 
when the degree distribution has an infinite third moment. We illustrate these results with 
several examples where the assortativity behaves in a non-sensible way. 

We further discuss alternatives for describing assortativity in networks based on rank correla- 
tions that are appropriate for infinite variance variables. We support these mathematical results 
by simulations. 

Keywords. Dependencies of heavy-tailed random variables, Power-laws, Scale-free graphs, Assor- 
tativity, Degree-degree correlations, Multivariate extremes 

1 Introduction 

Large self-organizing networks, such as the Internet, the World Wide Web, social and biological 
networks, often exhibit power-law degrees. In simple words, a random variable X has a power- law 
distribution with tail exponent 7 > if its tail probability F[X > x) is roughly proportional to 
x~'^, for large enough x. Power-law distributions are heavy tailed since the tail probability decreases 
much more slowly than negative exponential, and thus one observes extremely large values of X 
much more frequently than in the case of light tails. In the network context, such networks are 
called scale free, and the vertices having huge degrees are called hubs. Statistical analysis of complex 
networks characterized by power-law degrees has received massive attention in recent literature, see 
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e.g. |10| \T7\ [22] for excellent surveys. Nevertheless, there still are many fundamental open problems. 
One of them is how to measure dependencies between network parameters. 

An important property of networks is the dependence between the degrees of direct neighbours. 
Often, this dependence is characterized by the assortativity coefficient of the network, introduced 
by Newman in |20j . The assortativity coefficient is the correlation coefficient between the vector 
of degrees on each side of an edge, as a function of all the edges. See |20i Table I] for a list of 
assortativity coefficients for various real-world networks. The empirical data suggest that social 
networks tend to be assortative (i.e., the assortativity coefficient is positive), while technological and 
biological networks tend to be disassortative. In |2Ul Table I], it is striking that, typically, larger 
disassortative networks have an assortativity coefficient that is closer to and therefore appear to 
have approximate uncorrelated degrees across edges. Similar conclusions can be drawn from [21J, see 
in particular \^A^ Table II]. In this paper, we explain this effect mathematically, and conclude that 
the assortativity coefficient is not the way to describe dependencies between degrees on edges in the 
case of scale- free networks. 

Instead, we propose a solution based on the ranks of degrees to deal with degree-degree depen- 
dencies in networks. This rank correlation approach is in fact standard and even classical in the area 
of multivariate analysis, falling under the category of 'concordance measures' - dependency measures 
based on order rather than exact values of two stochastic variables. The huge advantage of such 
dependency measures is that they work well independently of the number of finite moments of the 
degrees, while the assortativity coefficient, despite the fact that it is always in [—1,1], suffers from a 
strong dependence on the extreme values of the degrees. This was already noted in the 1936 paper 
by H. Hotelling and M. R. Pabst [11]: 'Certainly where there is complete absence of knowledge of the 
form of the bivariate distribution, and especially if it is believed not to be normal, the rank correlation 
coefficient is to be strongly recommended as a means of testing the existence of relationship. ' Among 
recent applications of rank correlation measures, such as Spearman's rho [26j and the closely related 
Kendall's tau [13], is measuring concordance between two rankings for a set of documents in web 
search. In this application field many other measures for rank distances have been proposed, see 
e.g. |14j and references therein. 

We will show in numerical experiments that statistical estimators for degree-degree dependencies, 
based on rank correlations, are consistent. That is, for graphs of different sizes but similar structure 
(e.g. Preferential Attachment graphs of 1.000 and 100.000 nodes, respectively), these estimators give 
consistent values, and the variance of the estimator decreases as the size of the graph grows. We 
also analytically and numerically show that the assortativity coefficient absolutely does not have this 
basic property when degree distributions are heavy-tailed! 

The paper is organized as follows. We start with formal definitions of the sample correlation 
coefficient and the sample rank correlation in Section [2] In Section [3] we study a model with linear 
dependencies and demonstrate that, when sample size grows to infinity, the sample correlation coeffi- 
cient (assortativity) does not converge to a constant but rather to a random variable involving stable 
distributions. We also verify numerically that the rank correlation provides a consistent statistical 
estimator for this model. Next, in Section [4] we prove that if random variables are heavy-tailed and 
non-negative, then the sample correlation coefficient never converges to a negative value. Thus, such 
sequence will never be classified as 'disassortative'. We illustrate this result by an example of two 
nonnegative but negatively correlated random variables. This result is extended to random graphs in 
Section [5j where also numerical results are provided for assortativity coefficient and rank correlations 
in three different Configuration Models, and a Preferential Attachment graph. 
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2 Correlations between random variables 



2.1 Sample correlation coefficient 

The assortativity coefficient is in fact a statistical estimator of a correlation coefficient p for the 
degrees on the two ends of an arbitrary edge in a graph. In this section we formally define such 
estimator. The correlation coefficient p for two random variables X and Y with Var(X), Var(y) < oo 
is defined by 

_ E[XY] - E[X]E[Y] 
^~ ^Var(X)^Var(y)' 

By Cauchy-Schwarz, p G [—1, 1], and p measures the linear dependence between the random variables 
X and Y. We can approximate p from a sample by computing the sample correlation coefficient 

_ ^ YA=li^i - ^n){Yi - Yn) 

Sn{X)Sn{Y) ' ^'■'> 

where 

^ n 1 " 

Xn = — > ^ Xi, Yn = — y Yi 

denote the sample averages of (Xj)"^-^ and (li)r=i' while 



^ n ^ n 

Sl{X) = - Xn)\ Sl{Y) = Y,{y^ - Yn? (2.2) 

n — 1 ^-^ n — 1 ^-^ 



denote the sample variances. 

It is well known that, again under the assumption that Var(X), Var(y) < oo, the estimator pn 
of p is consistent, i.e., 

p 

Pn > P, 

P 

where — > denotes convergence in probability. In practice, however, we tend not to know whether 
Var(X), Var(y) < oo, since S^iX) < oo and S'^{Y) < oo clearly always hold a.s., for any sample. 



and, therefore, one might be tempted to always use the sample correlation coefficient in (2.1). In 
this paper we investigate what happens to pn when Var(X), Var(y) = oo, and show that the use of 
Pn in random graphs is uninformative, and it leads to deceptive behavior in the context of a linear 



dependence, such as in (3.1) below. 



2.2 Rank correlations 

For two-dimensional data {{Xi,Yi))f^i, let rf and rj be the rank of an observation Xi and Yi, 
respectively, when the sample values (Xi)^^-^ and (Yi)^^^ are arranged in a descending order. The 
idea of rank correlations is in evaluating statistical dependences on the data {{rf ,rY))f^^, rather 
than on the original data {{Xi, i^))"=i- Rank transformation is convenient, in particular because the 
two components of the resulting vector {rf,rY) are realisations of identical uniform distributions, 
implying many nice mathematical properties. 

The statistical correlation coefficient for the rank is known as Spearman's rho [26j: 

rank _ EUl (^f " + (^f " + l)/2) _ ^ 6^=1 o^ 

'T:=i{rf - (n + l)/2)2 Er(-r -{n+ 1)12? 



where li = ff — rf , see [TT]. The mathematical properties of the Spearman's rho have been exten- 
sively investigated in the literature. In particular, if {{Xi, 1^))"^^ consists of independent realizations 
of {X, Y), and the joint distribution function of X and Y is differentiable, then p™'^ is a consistent 
estimator, and its standard deviation is of the order l/\/n (see e.g. [8], where exact asymptotic 
expressions are derived). 
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3 Linear dependencies 



It is well known that p in general measures linear dependence between two random variables. Thus, 
it is natural to check how this measure, and our proposed alternative measures, perform when the 
relation between X and Y are described through the following linear model: 

X = aiUi + ■ ■ ■ + amU„„ Y = piUi + ■ ■ ■ + PmUm, (3.1) 

where Uj, j = 1, . . . , m, are independent identically distributed (i.i.d.) random variables with reg- 
ularly varying tail, and tail exponent 7. By definition, the random variable U is regularly varying 
with index 7 > 0, if 

F{U > x) =F{V > x) = L{x)x-^, (3.2) 

where L{x) is a slowly varying function, that is, for n > 0, L{ux) / L{x) — t- 1 as x — >• 00, for instance, 
L{u) may be equal to a constant or \og{u). Note that the random variables X and Y have the same 
distribution when (/3i, . . . , (3^) is a permutation of (ai, . . . , am)- Our main result in this section is 
the following theorem: 

Theorem 3.1 (Weak convergence of correlation coefficient). Let ((^i,^))f=i he i.i.d. copies of the 
random variables {X,Y) in (3.1), and where {Uj)JLi are i.i.d. random variables satisfying (3.2) with 
7 G (0,2), so that Var(C/j) = 00. Then, 

Pn >P= ^ ^ , (3.3) 

where {Zj)'jLi are i.i.d. random variables having stable distributions with parameter j/2 G (0, 1), and 

— ^ denotes convergence in distribution. In particular, p has a density on [—1,1], which is strictly 
positive on (—1, 1) if there exist k, I such that akl^k < < ai(3i, while the density is positive on (a, 1) 
when UkPk ^ for every k, where 

a= mm — , — , G (0, 1). (3.4) 

Sc{l,2....,m},\S\>2 2-1 ^ , /v^m a2-i ^ 

In order to prove the theorem we need the following technical result: 

Lemma 3.2 (Asymptotics of sums in stable domain). Let ([/jj)jg[„] jgp] be i.i.d. random variables 
satisfying ( |3.2[ ) for some 7 G (0,2). Then there exists a sequence an with an = n'^^"'i{n), where 
n I—)- £(n) is slowly varying, such that 



n 



1 " 

■ 1 • 1 

1=1 1=1 



Zi, — } Ui,iU,,2 ^ 0, (3.5) 



where Z\ is stable with parameter and denotes convergence in probability. 



Proof. Denote by F the distribution function of U. The proof of the first statement in (3.5) is 
classical when we note that the distribution function of C/^ equals u 1— t- F(^/u), which, by (3.2), 
is in the domain of attraction of a stable 7/2 random variable. In particular, we can identify 
On = [1 — F]^^(l/n^). To prove the second part of (3.5), we write 

1 - F{x) =F{U > x) <cx-^' , x>0, (3.6) 



which is valid for any 7' G (1,7) by ( |3.2[ ) and Potter's theorem. We next study the distribution 
function of U1U2 which we denote by H, where Ui and U2 are two independent copies of the random 
variable U. When F satisfies (3.6), then it is not hard to see that there exists a C > such that 

I- H{u) <C{1 + log u)u-"'' . (3.7) 
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Indeed, assume that F has a density f{w) = cw for w > 1. Then 

/oo 
f{w)[l- F]{u/w)dw. 

Clearly, 1 — F{w) = c'w~'^' for w >1 and 1 — F{w) = 1 otherwise. Substitution of this yields 



1 - Hiu) < cc 



w 



-{7'+l) 



/■oo 
J u 



When F satisfies (3.6), then Ui and U2 are stochastically upper bounded by Ul and with distri- 
bution function F* satisfying 1 — F*{w) = c'w~'^ V 1, where (x V y) = max{x,y}, and the claim in 
(3.7) follows from the above computation. 

By the bound in (3.7), the random variables Ui^iUi^2 are stochastically bounded from above by 
random variables Pi that are in the domain of attraction of a stable 7' random variable. As a result, 
there exists bn = n^^"' ^'(n), where n 1— )■ l'{n) is slowly varying, such that 



1 " 



bn 



W, 



1=1 



where W is stable 7'. By choosing 7' > 7/2, we get bn/cLn — )• 0, so we obtain the second statement 
in K^ □ 



Proof of Theorem 3. 1 . We start by noting that 



Pn 



and 



(3.8) 



(3.9) 



i=l 



Sn{X)Sn{Y) 

Sl{X) = j2{Xf - Xl), Sl{Y) = Y.iY^ - Yl 

n — 1 ^-^ n — 1 ^-^ 

i=l 

We continue to identify the asymptotic behavior of 

n n n 

E^^^^- 



The distribution of ((^j,^))f=i is described in terms of an array (t^jj)i£[n]j'E[m]j which are i.i.d. 
copies of a random variable U . In terms of these random variables, we can identify 



1=1 



(3.10) 



i=l 



The sums Yl^=i ^Ij ^"^^ i-i.d., and by Lemma 3.2, Yll=i ^«Ji^ij2 is of a smaller order. Hence, from 
(3.10) we obtain that 



-Y^XiY,^Y.^,fi,Z,. 

" i=l j=l 

Therefore, by taking a = /3, we also obtain 

n m 1 " 



(3.11) 



i=l 



1=1 
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and these convergence hold simultaneously. As a result, (3.3) follows. It remains to establish prop- 
erties of the limiting random variable p in ( |3.3[ ) . 

The density of Zi is strictly positive on (0, oo), so that the density of p is strictly positive on 
(— 1, 1) when the sign of ai^i is both positive as well as negative. When aj/3j > for every i, on the 
other hand, the density of p is strictly positive on the support of p, which is (a, 1), where 



inf 

^1 1 • ■ •i^m 



ET=1 ^3 Pi ^3 



E 



m 



€(0,1) 



(3.12) 



Denote the function that is minimized by 0(2:1, 



..,Zm)- Note that rescaling Zj = czj, 
j = l,...,m, does not change the value of a{zi, . . . , Zm)- In particular, we can choose c = 
{mayi{zi, Z2, ■ ■ ■ , Zm})^^ ■ Thus, without loss of generality, we can assume that zj £ (0,1], j = 
1,2, ... ,m and z^ = 1 for at least one k = 1,2, . . . ,m. In that case, a is a continuous function of 
Zj G [0, 1], j 7^ k. Taking a derivative of a with respect to Zj we obtain that the sign of the derivative 
is defined by the sign of the expression 



a{zi, Zm){a'j + /3j) + 2ajP 



3- 



(3.13) 



Since (3.13) is decreasing in a, the derivative of a{zi. 



if (3.13) is zero in some point then (3.13) is positive on (z| 



w.r.t. Zj cannot equal zero. (Indeed, 
for some small e only if a is 



decreasing on [z*, z* + e), thus, we obtain a contradiction.) We conclude that a achieves its minimum 
when all Zj's equal either zero or one, and at least one of the values must equal one. Finally, if only 
one value Zj is equal to one and the rest are equal to zero, then we obtain a = 1, which is a maximal 
possible value of o. Thus, at least two values of Zj must equal one. 

□ 

To illustrate the result of Theorem 3.1 , consider the example with C/j's from a Pareto distribution 
1/x^'^, X > 1, so L{x) = 1 and 7 = 1.1 in (3.2). The exponent 7 

3 and a 



1.1 is as 
1, 2, 3, as specified 



satisfying F{U > x) = l/x^'^, x > 1, so L{x) = 1 and 7 

observed for the World Wide Web |9]. In ( |3.1| ), we choose m = 3 and Ui, j5i, i 
in Table [1] We generate data samples ((Aj,yi))"^^ and compute pn and p^^^^ each of the 
samples. Thus, we obtain the vectors {pn,j)j'-. 
and p^n^^: respectively, where the sub- index j 
We then compute 



^ and {p^nf^)jLi of ^ independent realisations for pn 
= 1, . . . ,N denotes the jth realization of ((ATj, yj))"^^. 



E 



N 



N N 



(3.14) 



(^N{Pn 



1 ^ 



CTNip 



rank\ 



1 ^ 

\ EKf - IEjv(pr"))'- (3.15) 

\ 3=1 



The results are presented in Table [TJ We clearly see that pn has a significant standard deviation, 
of which estimators are similar for different values of n. This means that in the limit as n — )• 00, 



Pn is a random variable with a significant spread in its values, as stated in Theorem 3.1 Thus, by 
evaluating p„ for one sample ((A^i, 5^))f=i we will obtain a random number, even when n is huge. 
The convergence to a non-trivial distribution is directly seen in Figure [l] because the plots for the two 
values of n almost coincide. Note that in all cases, the density is fairly uniform, ensuring a comparable 
probability for all feasible values and rendering the value obtained in a specific realization even more 
uninformative. 

On the other hand, from Table [l] we clearly see that the behaviour of rank correlations is exactly 
as we can expect from a good statistical estimator. The obtained average values are consistent while 
the standard deviation of pl^^^ decreases approximately as l/i/n as n grows large. Therefore, p^^"'^ 
converges to a deterministic number. 
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U.44oO 


U.40U4 
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n nOOQ 

u.Uzyo 


u.uuyi 


n nriQQ 

U.UUoo 




Iff t n ^ 


U.ozOi 


U. / yoD 


U.ozoy 


U.oU ( u 


— n /9 1/3 1 /fi^l 


^n{Pii) 


n 1 1 ^1 


nil oc; 


nil riQ 


nil Qn 


— n /R 1 /*? 1 /o^i 


'^NyPn ) 


n ssnn 

U.OOUU 


8850 


8858 


8856 

U.OOUU 




N xHn 1 


0.0248 


0.0073 


0.0023 


0.0007 




EAf(Pn) 


-0.3052 


-0.3386 


-0.3670 


-0.3203 


a = (1/2, -1/3, 1/6) 


O-Af(Pn) 


0.6087 


0.5841 


0.5592 


0.5785 


/3 = (1/6, 1/2, -1/3) 


Eiv(pr") 


-0.3448 


-0.3513 


-0.3503 


-0.3517 




on{pT'') 


0.1202 


0.0393 


0.0120 


0.0034 



Table 1: Estimated mean and standard deviation of p„ and pt^ in N samples with linear dependence (3.1 1 



EmpiticalCDFofp^ 







































y ^ , 








■y ri=1 .000 








r=1 0.000 








LK1^2. 1/3. 1J6) 1 








p=i 1/6. 1/3,1/2) 























































ri=1.000 








(1=10.000 " 


V 






|ci=(1/2,-1/3,1«) 








p=(1/6, 1/2,-1/3) 


°1 -0 3 -0 









Figure 1: The empirical distribution function Fpf{x) = P(p„ < x) for the N 
(n = 1.000, n = 10.000), in the case of linear dependence \2>.1\ . 



1.000 observed values of p„ 



4 Sample correlation coefficient for non-negative variables 

In this section, we investigate correlations between non-negative heavy-tailed random variables. Our 
main result in this section shows that the correlation coefficient is asymptotically non-negative: 

Theorem 4.1 (Asymptotic non-negativity of correlation coefficient for positive r.v.'s). Let {{Xi, Yi))"^^-^ 
he i.i.d. copies of non-negative random variables {X,Y), where X and Y satisfy 



¥{X >x) = Lxix)x- 

with 7x,7y G (0,2), so that Var(X) = 
coefficient is non-negative. 



Ix 



¥{Y > y) = Ly{y)y-^^ , x,y>0, (4.1) 
Var(y) = oo. Then, any limit point of the sample correlation 



We illustrate Theorem 



4.1 



with a useful example. Let (?7j)^]^ be a sequence of i.i.d. random 
variables satisfying \2>.2\ for some 7 G (0,2), and where C/ > a.s. Let {X,Y) = (0,2C/) with 
probability 1/2 and {X,Y) = (2C/,0) with probability 1/2. Then, = a.s., while E[X] = E[y] = 
E[[/] and Var(X) = Var(y) = 2E[C/2] _ E[C/]2 = 2Var(C/) + E[C/]2 Therefore, if Var(?7) < 00, 



Pn 



P 



E[^]^ 
'2Var(?7) +E[[/] 



2 e(-i,o). 



(4.2) 



The asymptotics in (4.2) is quite reasonable, since the random variables {X,Y) are highly negatively 
dependent: When X > Q,Y must be equal to 0, and vice versa. Instead, when {Uij'^^i is a sequence 
of i.i.d. random variables satisfying (3.2) for some 7 G (0,2), and where C/ > a.s., then pn -^-^ 0, 
which is not what we would like. 

Table [2] shows the empirical mean and standard deviation of the estimators pn and p^^^^^- Here 
P(C/ > x) = x"^'^ , X > 1, as in Table [l] As predicted by Theorem 
coefficient (assortativity) converges to zero as n grows large, while p^^^ 



4.1 



the sample correlation 
^ consistently shows a clear 
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N 


10^ 


102 


Th 


10 


10^ 


10=* 


10* 


10^ 




-0.4833 
0.1762 


-0.1363 
0.0821 


-0.0342 
0.0245 


-0.0077 
0.0064 


-0.0015 
0.0011 




-0.6814 
0.1580 


-0.4508 
0.0283 


-0.4485 
0.0082 


-0.4504 
0.0024 


-0.4519 
0.0007 



Table 2: The mean and standard deviation of p„ and plf''^^ in N simulations of {{Xi, Yi))"^^, where X — 2UI, 
Y = 2U{1 - I), / is a Bernoum(l/2) random variable, V{U > x) = x'^-'^, x > I. 



negative dependence, and the precision of the estimator improves as n — )• oo. This explains why 
strong disassortativity is not observed in large samples of power-law data. 

We next prove Theorem |4.1[ 



Proof of Theorem^Jj Clearly Y11=i ^i^i > when Xi >0,Yi> 0, so that 



Sn{X)Sn{Y) n-lSn{X)Sn{Yy 

It remains to show that if Var(X) = oo, then Xn/Sn{X) 0. Indeed, if 7 G (1,2) then X„ 



¥\X] < 00 by the strong law of large numbers, and from ( |4.1[ ), (2.2) and Lemma 3.5 it follows that 
Sn{X) = n2/7x+o(i)-i ^ 00 as n ^ 00. When 7 G (0,1], instead, X„ = n^hx-i+o{i) ^ ^^^^ 

Xn/SniX) = n-i/'^^+°(i) ^ 0. This proves the claim. □ 

5 Applications to random graph models 

The correlation coefficient is particularly important in the setting of degree-degree correlations in 
real-world networks. Let G = {V, E) be a graph with vertex set V and edge set E. The assortativity 
coefficient of G is equal to (see, e.g., [20, (4)]) 

1 ( I 1 

\E\ ^i^j ~ \ \E\ T^iji^E 2 + ^. 

p{G) 



jk E^,eE liDf + D]) - [m T.^,^E 5(A + D 



2 ' 



where the sum is over directed edges of G, and Di is the degree of vertex i, i.e., ij and ji are two 
distinct edges. The assortativity coefficient is equal to the correlation coefficient of the sequence 
of random variables {{Di, Dj))ij^E- Thus, the assortativity coefficient is the correlation coefficient 



between two sequences of non-negative random variables, as studied in Theorem 4.1, We refer to 
|23| for an extensive introduction to networks, their empirical properties and models for them. 

This section is organized as follows. In Section [5. 1| we show that all limit points of the assorta- 
tivity coefficients for sequences of growing scale-free random graphs with power-law exponent 7 < 3 
are non-negative, a result that is similar in spirit to Theorem |4.1[ We highlight this statement by 
presenting theoretical and numerical results for several random graph examples where the assorta- 
tivity coefficient yields unexpected and unwanted results. In Section [5.2| we present an example of a 



sequence of random graphs where the assortativity coefficient converges to a proper random variable. 



as observed in the i.i.d. setting in Theorem 3.1 



5.1 No disassortative scale-free random graph sequences 

We compute that 

^ E ^(A + D,) = ^ E ^ E + = ^ E 

' ' ij&E ' ' i£V ' ' ijeE ' ' ieV 
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Thus, p{G) can be written as 



p{G) = (5.1) 



Zliey \E\ { SiGV ^ 



Consider a sequence of graphs {Gn)n>i, where n denotes the number of vertices n = |y| in the 
graph. Since many real-world networks are quite large, we are interested in the behavior of piG^i) 
as n — )• oo. Note that this discussion applies both to sequences of real- world networks of increasing 



size, as well as to graph sequences of random graphs. We start by generalizing Theorem 4.1 to this 
setting: 

Theorem 5.1 (Asymptotic non-disassortativity of scale-free graphs). Let {Gn)n>i be a sequence of 
graphs of size n satisfying that there exist 7 S (1, 3) and < c < C < 00 such that cn < \E\ < Cn, 
cri^l^ < maxjg[„] Di < Cn^l"' and cn^'^^"'^'^^ < < Cn^'^^"'^'^^ . Then, any limit point of the 

assortativity coefficients p{Gn) is non-negative. 



Proof. We note that Di > for every i so that, from (5.1) 



\E\ \ Z^ieV 



2 



Siey \E\ ( ^ieV 



2 • 



By assumption, Y^iev ^ i^^^ie[n] Dif > c^n^/^, whereas |^ Y^iav ^tf ^ (CVc)n2(2/7Vi)-i 
(C72/c)n[(^/^-i)vil. Since 7 G (1,3) we have (4/7 - 1) V 1< 3/7, so that 



00. 



This proves the claim. □ 

In the literature, many examples are reported of real- world networks where the degree distribu- 
tion obeys a power law (see [H [22] for surveys of real- world networks and their degree properties) . In 
particular, for scale- free networks, the proportion p^ of vertices of degree k is close to pk ~ ck~^~^, 
and most values of 7 reported in the literature are in (1,3), see e.g., [H Table I] or [22l Table I]. 
When this is the case, we can expect that 

\E\ = Di fin, 



where p = IE [I?], while maxigy Di ~ n^^"', and 




Pp when 7 > p, 

c'nPl^~^ when 7 < p. 



where p^ = E[L'^]. In particular, the conditions of Theorem 5.1 hold and p{Gn) when 7 < 3. 
Thus, the asymptotic degree-degree correlation of the graph sequence (G„)„>i is non-negative. As 
a result, there exist no disassortative scale-free graph sequences. 

We next consider four random graph models to highlight our result. In the remainder of this 
section we first describe three models: Configuration model. Configuration model with intermediate 
vertices, and Preferential Attachment model. Then we present the numerical results for these models 
in Table |3j As we see from the results, in all these models assortativity converges to zero as n grows. 
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The configuration model. The configuration model was invented by Bollobas in [5], inspired by 
[1] . Its connectivity structure was first studied by Molloy and Reed [181 HH] • It was popularized by 
Newman, Srogatz and Watts pi], who realized that it is a useful and simple model for real- world 
networks. Given a degree sequence, namely a sequence of n positive integers d = {di,d2, ■ ■ ■ , dn) with 
in = X^jg[n] '^i assumed to be even, the configuration model (CM) on n vertices and degree sequence 
d is constructed as follows: 

Start with n vertices and di half-edges adjacent to vertex i. The graph is constructed by randomly 
pairing each half-edge to some other half-edge to form an edge. Number the half-edges from 1 to £-n 
in some arbitrary order. Then, at each step, two half-edges that are not already paired are chosen 
uniformly at random among all the unpaired half-edges and are paired to form a single edge in 
the graph. These half-edges are removed from the list of unpaired half-edges. We continue with 
this procedure of choosing and pairing two unpaired half-edges until all the half-edges are paired. 
Although self-loops may occur, these become rare as n — )• oo (see e.g. [6] or [12] for more precise 
results in this direction) . We consider both the cases where the self- loops are removed and we collapse 
multiple edges to a single edge, as well as the setting where we keep the self-loops and multiple edges. 
As we will see in the simulations, these two cases are qualitatively similar. 

We investigate the CM where the degrees are i.i.d. random variables, and note that the proba- 
bility that two vertices are directly connected is close to didj/in- Since this is of product form in i 
and J, the degrees at either end of an edge are close to being independent, and in fact are asymptot- 
ically independent. Therefore, one expects the assortativity coefficient of the configuration model to 
converge to in probability, irrespective of the degree distribution. 



Configuration model witli intermediate vertices. We next adapt the configuration model 
slightly, by replacing every edge by two edges that meet at a middle vertex. Denote this graph by 
Gn = iVn,En), while the configuration model is G„ = {Vn,En)- In this model, there are n + £n/2 
vertices and 2£„ edges (recall that ij and ji are two different edges). For st £ En, the degree of either 
vertex s or vertex t equals 2, and the degree of the other vertex in the edge is equal to Di, where i 
is the unique vertex in the original configuration model that corresponds to s or t. Therefore, 

st(^E„ «GV„ 



and for p > 2, 



^ E = + ^ E = + ^ E ^f' 

where ^ip = E[L>p]. As a result, when 7 > 3, 

. p 2/X2///i-(l+A^2/(2m))' . . 
^' (2 + ^3/(2/ii))-(l + /i2/(2m))2 ■ 

The fact that the degree-degree correlation is negative is quite reasonable, since in this model, vertices 
of high degree are only connected to vertices of degree 2, so that there is negative dependence between 
the degrees at either end of an edge. When 7 < 3, on the other hand, 113 = K[D^] = 00, and thus 

p{Gn) ^ 0, 

which is inappropriate, as the negative dependence of the degrees persists. 



Preferential Attachment model. We consider the basic version of the undirected Preferential 
Attachment model (PAM), where each new vertex adds only one edge to the network, connecting 
to the existing nodes with probability proportional to their degrees. In this case, it is well known 
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that 7 = 2 (see e.g., [2] or [7]). We see that the assortativity converges to zero, as indicated in 
Theorem |5.1[ while Spearman's rank correlation indicates that degrees are negatively dependent. 
This can be understood by noting that the majority of edges of vertices with high degrees, which are 
old vertices, come from vertices which are added late in the graph growth process and thus have small 
degree. On the other hand, by the growth mechanism of the PAM, vertices with low degree are more 
likely to be connected to vertices having high degree, which indeed suggests negative degree-degree 
dependencies. 



Numerical results. To illustrate our results, we have generated the two configuration models 
and the Preferential Attachment model of different sizes. For fair comparison, we chose 7 = 3 for 
the configuration model: F{D > x) = a; > 1. Since in the configuration model self-loops and 
multiple edges are possible, we considered two versions: the original model with self-loops and double 
edges present, and the model where self-loops and double-edges are removed. 

The rank correlation coefficient p^'^'^^{G) is computed using (2.3) as follows. We define the 



random variables X and Y as the degrees on two ends of a random undirected edge in a graph (that 
is, when rank correlations are computed, ij and ji represent the same edge). For each edge, when 
the observed degrees are a and b, we assign [X = a, Y = b] or [X = b,Y = a] with probability 
1/2. Furthermore, many values of X and Y will be the same because a degree d will appear at the 
end of an edge d times. We resolve the draws by adding independent random variables, uniformly 
distributed on [0, 1], to each value of X and Y. The results are presented in Table [sl 



Model 


N 


10^ 


10^ 


10 


n 


10^ 


10-^ 


10^ 


10" 


Configuration model 
with self-loops and double edges 


Eiv(p(G„)) 
o-Af(p(G„)) 


0.0021 
0.0672 


-0.0013 
0.0212 


0.0001 
0.0068 


-0.0003 
0.0024 


E^(pr") 

(Tw(p-"k(G„)) 


0.0012 
0.0656 


-0.0010 
0.0202 


-0.0002 
0.0066 


-0.0002 
0.0014 


Configuration model 
without self-loops and double edges 


Ew(p(G„)) 
aN{p{Gn)) 


-0.0785 
0.0686 


-0.0346 
0.0274 


-0.0115 
0.0102 


-0.0046 
0.0039 


EAr(p-"^(G„)) 
TAr(p™'^(G„)) 


-0.0615 
0.0836 


-0.0151 
0.0337 


-0.0040 
0.0075 


-0.0002 
0.0024 


Configuration model 
with intermediate vertices 


Ew(p(G„)) 
o-Ar(p(G„)) 


-0.2589 
0.0872 


-0.1243 
0.0509 


-0.0587 
0.0255 


-0.0303 
0.0189 


Ea,(p-"^(G„)) 
aw(p™HG„)) 


-0.7482 
0.0121 


-0.7499 
0.0036 


-0.7498 
0.0011 


-0.7501 
0.0006 


Preferential attachment 


Ew(p(G„)) 
CTiv(p(G„)) 


-0.2597 
0.0550 


-0.1302 
0.0261 


-0.0607 
0.0127 


-0.0294 
0.0088 


E^(p-"^(G„)) 
(7^(p-"HG„)) 


-0.4167 
0.0695 


-0.4151 
0.0202 


-0.4166 
0.0066 


-0.4158 
0.0022 



Table 3: Estimated mean and standard deviation of p(G) and p'' (G) in random graphs 



Within the same model, the graphs of different sizes are constructed by the same algorithm. 
Thus, their mixing patterns are exactly the same. As we predicted, the assortativity reduces in 
absolute value with the graph size, resulting in asymptotically neutral mixing for all models. On 
the contrary, the rank correlation coefficient consistently shows neutral mixing for the configura- 
tion model, moderately disassortative mixing for the Preferential Attachment graph, and strongly 
disassortative mixing for the configuration model with intermediate edges. 
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5.2 Random graphs with asymptotically random assortativity 



Finally, we discuss the possibility that p{Gn) in (5.1) converges to a random variable when the 



number of vertices tends to infinity. Under the assumptions of Theorem 5.1, we have that 



l/7+(2/7Vl) 



V DiDi > max Di > cn^/^' 
Further, from the proof of Theorem |5.1[ we know that 



VZ)f > (maxA)^ > 
i&V 



ten 



and 



(4/7-l)Vl 



(5.2) 
(5.3) 
(5.4) 

(5.5) 

(5.6) 



where we see that (5.6) is vanishing compared to (5.5). The convergence of ( 5 . 1 ) to a random variable 



can only take place if the crossproducts on the left-hand side of (5.2 - 5.4) are of the same order of 



magnitude as the left-hand side of (5.5). As we see from the above, this is possible for 7 E (1,3). 



However, the convergence will be slow because an easy calc ulati on shows that the maximal difference 
in the order of magnitude between the right-hand sides of (5.2) and (5.6) is n^/^. 



Below we present an example where we prove that p{Gn) indeed converges to a random variable, 
and illustrate numerically how the distribution of p{Gn) changes as n grows large. However, due to 
the slow convergence, a substantially larger computational capacity is needed in order to (almost) 
achieve the limiting distribution. 

A collection of complete bipartite graphs. Take ((X^, 1^))"^^^ to be an i.i.d. sample of random 



variables as in (3.1), where ai = 02 = /3i = ^, P2 = oh for some & > and a > 1. Then, for 
i = l,...,n, we create a complete bipartite graph of Xi and Yi vertices, respectively. These n 
complete bipartite graphs are not connected to one another. We denote such collection of n bipartite 
graphs by G„. The graph Gn has \V\ = Yl'i=\i-^i~^^i) vertices and \E\ = 2 ^11=1 ^i^i edges. Further, 

ieV i=l ijeE 



i=l 



Assume that the Uj satisfy (3.2) with 7 G (3,4), so that E[C/^] < 00, but E[[/^] = 00. As a result, 
\E\/n ^ 2E[XY] < 00 and ^ Xliey ^ HXY{X + Y)] < 00. Further, 



n 



N 

-4/75-4 ^(xfYi + Y^Xi) ^ (a^ + a)Zi + 2Z2, n-^/^b"^ Y.^XiYif A a^Zi + Z2, 

1=1 i=l 

where Zi and Z2 and two independent stable distributions with parameter 7/4. As a result, 

i 2a^Zi + 2Z2 



p{G„ 



as n — )• 00. 



(a + a3)Zi + 2Z2' 
which is a proper random variable taking values in (2a/(l + a^), 1 
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71 


10^ 


10^ 


lO'' 


10^ 


Eiv(p(G„)) 


0.6855 
0.1389 


0.7293 
0.0681 


0.7877 
0.0614 


0.8224 
0.0629 


Ejv(/""^(G„)) 
ajv(/""'^(G„)) 


0.7556 
0.0791 


0.8370 
0.0379 


0.8577 
0.0247 


0.8641 
0.0128 



Table 4: Estimated mean and standard deviation of p(G„) and p''™'^(G„) for the collection of n complete 
bipartite graphs. The number of realizations for each graph size is = 100. 



In Table|4|we present numerical results for and p^^'^^{Gn)- Here we choose b = 1/2, a = 2, 

and U has a generalized Pareto distribution F{U > x) = ((1.8 + x)/2.8)^'^'^ , x > 1. 

Note that in this model there is a genuine dependence between the correlation measure and 
the graph size. Indeed, if n = 1 then the assortativity coefficient equals —1 because nodes with 
larger degrees are connected to nodes with smaller degrees. However, when the graph size grows, the 
positive correlations start dominating because of the positive linear dependence between X and Y. 
We see that again the rank correlation captures the relation faster and gives consistent results with 
decreasing dispersion of values. Finally, Figure [2] shows the changes in the empirical distribution of 
p{Gn) as n grows. It is clear that a part of the probability mass is spread over the interval (0.8, 1). 



Empirical CDF of p(G ) 




Figure 2: The empirical distribution function P(p(G„) < x) for the N = 100 observed values of p(G„), where 
Gn is a collection of n complete bipartite graphs. 

In the limit, p{Gn) has a non-zero density on this interval. The difference between the crossproducts 
and the expectation squared in p{Gn) is only of the order n^/'^, which is n}/"^-^ in our example, thus, 
the convergence is too slow to observe it at n = 100.000. 



6 Discussion 



In this paper, we have investigated dependency measures for power-law random variables. We have 
argued that the correlation coefficient, despite its appealing feature that it is always in [—1,1], is 
inappropriate to describe dependencies between heavy-tailed random variables since it yields insen- 
sible results. Indeed, the two main problems with the sample correlation coefficient are that (a) it 
can converge to a proper random variable when the sample size tends to infinity, indicating that it 
fluctuates tremendously for different samples, and (b) that it is always asymptotically non-negative 
when dealing with non-negative random variables (even when these are obviously negatively depen- 
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dent). In the context of random graphs, the first deficiency means that the assortativity can have a 
non- vanishing variance even when the size of the graph is huge, the second means that there do not 
exist asymptotically disassortative scale-free graphs. We give proofs for the facts stated above, and 
illustrate the results using simulations. 

Further research is needed to study rank correlations on graphs. Although the numerical results 
suggest consistency of the Spearman's rho estimator on random graphs, the values of the vectors 
{Xi,Yi)ij^E ctre in general not independent. Indeed, if one edge emanates from a node with degree 
250, the there are 499 other edges {ij and ji being different edges) for which either Xi or Yi is equal 
to 250. Thus, the consistency of the estimator does not generally follow and needs a case by case 
proof. 

Rank correlations are a special case of the broader concept of copulas that are widely used 
in multivariate analysis, in particular in application in mathematical finance and risk management. 
There is a heated discussion in this area about the adequacy and informativeness of such measures, see 
e.g. [E] and consequent reactions. There are several points of criticism. In particular, Spearman's rho 
uses rank transformation, which changes the observed values of the degrees. Then, first of all, what 
exactly does Spearman's rho tell us about the dependence between the original values? Second of all, 
no substantial justification exists for the rank transformation, besides its mathematical convenience. 
We thus do not claim that Spearman's rho is the solution for the problem. The main point of this 
paper is rather that the assortativity coefficient is not a solution at all, and that better solutions 
must be sought and can be found. 

Raising the discussion to a higher level, random variables X and Y are positively dependent 
when a large realization of X typically implies a large realization of Y. A strong form of this notion 
is when ¥{X > x,Y > y) > ¥{X > x)¥{Y > y) for every x,y gM, but for many purposes this notion 
is too restrictive. The covariance for non-negative random variables is obtained by integrating the 
above inequality over x,y > 0, so that it is true for 'typical' values of x, y. In many cases, however, 
we are particularly interested in certain values of x, y. Another class of methods for measuring 
rank correlations is based on the angular measure, a notion originating in the theory of multivariate 
extremes, for which the above inequality is investigated for large x and y, so that it describes the 
tail dependence for a random vector {X, Y), that is, the dependence between extremely large values 
of X and Y, see e.g. [25]. Such tail dependence is characterized by an angular measure on [0, 1]. 
Informally, a concentration of the angular measure around the points and 1 indicates independence 
of large values, while concentration around some other number a G (0, 1) suggests that a certain 
fraction of large values of Y comes together with large values of X. In \27\ [28] a first attempt was 
made to compute the angular measure between in-degree of a node and its importance measured 
by the Google PageRank algorithm. Strikingly, completely different dependence structures were 
discovered in Wikipedia (independence). Preferential Attachment networks (complete dependence) 
and the Web (intermediate case). 
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